Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or Staffbleblanc Date: Apr 3, 2004 9:15am
Forum: etree Subject: Re: harping on the parser

You could just update the info file formats before importing them until Jon adds this. I'm sure that nobody will care if you update the numbering formats. I only say this because the Archive folks kinda have full plates right now...

http://www.archive.org/audio/etree-details-db.php?id=48 - delete the "close parenthesis" and the problem is solved.

http://audio12-bu.archive.org/0/audio/rad2002-05-04.shnf/rad2002-05-04.txt - remove the empty space before the track numbers and add a period after them.

This post was modified by bleblanc on 2004-04-03 17:15:08

Reply to this post
Reply [edit]

Poster: xtifr Date: Apr 3, 2004 9:57am
Forum: etree Subject: Re: harping on the parser

These aren't shows I've uploaded/imported, these are shows that are (or were) on the needs-meta list. Jon said there are "2,745 shows to go". I'm suggesting that minor improvements to the parser could perhaps make many hundreds of those go away automagically.

As a free software developer/volunteer myself, I definitely understand the lack-of-time issue. And as an experienced Perl coder, I'd be more than glad to help build a better parser -- if Jon wants to post or email me the regexp he's using now.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffbrewster Date: Apr 4, 2004 6:42am
Forum: etree Subject: correcting metadata-- using better parser or wiki ?

Thank you for the offer to build a better parser. Andy Jewell did one that got us 3000 parses (I spent a day, then cleaning up the errors in his parses, and then jon spent a day integrating it into the system). this extra work, especially on jon, is something we would like to not repeat.

Therefore, if we can find a way to help this whole process, that would be great. Dreaming here, but what if perl volunteers reformat the setlists we can not read into setlists we can? then we can use those for the template forms and still work within the workflow done before? does this help?

Another is to use a wiki to note what shows are done (take them off the list) or why they are hard. I stuck jon's latest list in our local wiki (I blew part of it, but it work as a straw man to see if this is the way we want to go)

http://homeserver.archive.org/twiki/bin/view/Main/NeedsMetadata

-brewster

Reply to this post
Reply [edit]

Poster: xtifr Date: Apr 4, 2004 7:18am
Forum: etree Subject: Re: correcting metadata-- using better parser or wiki ?

Ah, ok, given the history, I certainly understand the reluctance to engage in any ad-hoc hackery. I was thinking more of incremental changes to the existing parser (which is why I was hoping to see the code), rather than an all-new from-scratch parser. Incremental changes should _in theory_ be a lot simpler to test and apply than a wholesale rewrite. But I realize that theory and practice are two different things.

Your idea of creating new text files with perl (or PHP) isn't bad. I'll play around with that.

The Wiki idea is interesting, but looks high-maintenance at the moment. I'm willing to be convinced otherwise.

I'd still like to see the current parser, because I'd like to adapt it to my own purposes (setting metadata in my local flac files). But I suppose that's a bit of a side issue.

Here's a regular expression fragment that may or may not be of interest to anyone: "^\s?\d\d?\s?(\s|[-.:)])\s*"

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffMTravellerH Date: Apr 5, 2004 2:42am
Forum: etree Subject: Re: correcting metadata-- using better parser or wiki ?

I have noticed a big problem concerning flac files and the parser. Somehow they seem to be a lot more critical than our normal shn setlists. But this is surely due to bad fortune. Can't see a really good reason why this should be the case. :) Oh btw, I simply cut and paste if the setlist is wrong. A parser is always bound to not perfectly work (shrug) It would be nice, though, if setlists and tracks would better coincide.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffalienbobz Date: Apr 4, 2004 7:59am
Forum: etree Subject: Metadata update, recommendation

I just went a few of the shows that need metadata. All of the Jack Johnson shows are ones that have been removed due to the taper's request. I can't fix the metadata because the show is no longer there.

I also have another recommendation for making metadata "easier". I think text files should become mandatory. They do help out with the source and everything else. I have run into many shows that don't even have them and I just find that to be very tacky. So, how about the ones that don't have them, maybe we could make simple text files with all of the info? Let me know what you think.

Jarod

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffbleblanc Date: Apr 4, 2005 2:12am
Forum: etree Subject: Re: Metadata update, recommendation

I have run into many shows that don't even have them [info files] and I just find that to be very tacky

Hi Jarod, in situations like this feel free to put one together and email it to me (as an attachment) bleblanc ATTTT archive DOTTTTT org

Please include the following in your email:

1. Server that the show resides on.
2. Path to the show folder (/1/audio/sci2003-12-31.flac16)
3. Link to the details page.

I will go through and add these as folks send them to me. I agree, it's a pain to find shows that don't have them....

-Brad

This post was modified by Brad Leblanc on 2005-04-04 09:12:30

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffbleblanc Date: Apr 3, 2004 1:05pm
Forum: etree Subject: Re: harping on the parser

Gotcha xtifr, makes sense

Terms of Use (10 Mar 2001)